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SAMPLE-MOMENT ESTIMATES OF THE MEAN VECTOR AND THE COVARIANCE 
MATRIX FOR A SAMPLE OF INCOMPLETE DATA VECTORS 


1 . INTRODUCTION 

In this memorandum, it is supposed that X is a random variable in n-dimensional 
Euclidean space (R n ) and that one must estimate the mean vector y = E( X) and 
the covariance matrix £ - E(X - y)(X - y)^ on the basis of a given independent 
sample x 55 ^ of the observations on X. Under normal circumstances, 

a satisfactory solution of this frequently encountered estimation problem is 
given by the sample-moment estimates 

N 

m = i!rX) x k <’> 

k=l 


S = if 2 ^ x k " m ^ x k " m ^ T ( 2 ) 

k=l 

The estimate m is unbiased and consistent. 1 The estimate S is consistent, 
although it is biased. In fact, E(S) = — ^ — E; hence, N -— y S is an unbiased 
estimate of E. If X is normally distributed, then m and S are maximum- 
likelihood estimates of y and E, respectively. The estimates m and S have 
the numerically attractive property that they can be evaluated with one pass 
through the observations in x* 

This estimation problem requires special treatment if a given sample of obser- 
vations contains vectors which are incomplete in the sense that some of their 
components are unknown or missing. In the following section, expressions are 
offered for sample-moment estimates of the components of y and E determined by 
those sample observations for which the appropriate components are known. As 
with the estimates given by eqs. (1 and 2) for complete data vectors, the 
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For convenience, estimates and their associated estimators are identified. 


resulting estimate of y is unbiased and consistent, and the resulting estimate 
of £ is consistent, although biased. (As with the estimate given by eq. (2), 
the bias is removable.) A procedure is then outlined for evaluating these 
estimates which requires only one pass through the observations in x* 

2. ESTIMATES OF y AND E 

Assume that the given independent sample x = ^ x k^k=l ••• N observations on 
X contains vectors which are incomplete in the above sense, and denote the 
i th component of y by y. and the ij£fc entry of l by a.. (1 < i, j < n). 

I I J 

Expressions are derived below for sample-moment estimates of y., and o . . deter- 

I I J 

mined by those observations in x "for which the appropriate components are 
known. 

To obtain the expression for the sample-moment estimate of y. , let x ki denote 
the i th component of (1 < k < N) and let x i be the set of observations x k 
in x "for which x ki is known. Denoting the number of observations in x^ by 
N.j , the expression 


m i = rC IZ x ki 

x k £ *i 

is immediately obtained as the desired sample-moment estimate of y^ . The 
estimate m. is clearly unbiased and consistent. 

To obtain the expression for the sample-moment estimate of a,., set 
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v.. = y. n x- and denote by N.. the number of observations in >... The 
A U A i ' 1 A j J ij A ij 

expression 
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is offered as the desired sample-moment estimate of o^. In this expression, 
m. and m. are, respectively, the estimates of y. and y. given (mutatis mutandis) 

1 J ® J 
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by eq. (3). The estimate is easily seen to be consistent, although it is 
biased. Indeed, 


E ( s ,j> 



Note that the bias of s. . is removable in the sense that one may easily obtain 
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from s.. the unbiased estimate 
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An alternate sample-moment expression is 
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One might consider the estimate given by eq. (4) to be more appealing than 
that given in eq. (5). For one thing, if s^. is given by eq. (5), then 
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it follows that the bias of the estimate given by eq. (4) is less than that of 
the estimate given by eq. (5). Of course, this is not a very serious consid- 
eration, since the bias of both estimates is removable. A more serious 
consideration is that the estimate given by eq. (4) seems likely to have 
smaller variance than that given by eq. (5). This has not been proved; 

however, the following heuristic arguments are offered: Since m. and m. 

Ml ( i 1 1 J 

have smaller variances than ml J ' and ml ' , respectively, m. and m. should 

I J • J 

contribute less to the variance of the sum in eq. (4) than mj^ and mj^ 
contribute to the variance of the sum in eq. (5). Also, if X is normally 
distributed and y. and y. are known, then the maximum-likelihood estimate of 
a^y based on x-jj. is 
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This estimate is unbiased and, therefore, has minimum variance among all 
unbiased estimates. Since m. and m. have smaller variances than m!^ and 

/ i \ 1 j I 

ml , respectively, the estimate given by eq. (4) should differ less than 

J A 

that given by eq. (5) from the minimum-variance estimator a... 

* J 


3. A ONE-PASS EVALUATION PROCEDURE 


A procedure is now outlined for evaluating the sample-moment estimates given 
by eqs. (3) and (4) for all i and j between 1 and n. This procedure requires 
only one pass through the observations in the sample set x- Assume that the 
sets x,- and x< have been determined, and rewrite eq. (4) as 
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Since s 
j > i. 


^ = s^, it is necessary to evaluate the quantities s^ only for 
Note that 


e i V' 2 2 

ii ‘ N. 2-r x ki " m i 
x k e *i 

For convenience in describing the procedure, set 


T 'i = N li 5Z Xki><k J 

° Wj 


( 7 ) 


PROCEDURE: 

1. Initialize 

(a) N. = 0, m. = 0 for 1 < i < n 

(b) T. . = 0 for 1 < i < j < n 

* J 

(c) N, . = 0, mj^ = 0, = 0 for 1 < i < j < n 

2. Carry out the algorithm described in figure 1. This algorithm produces 
m. and for 1 < i < n and m|^, m^, and for 1 < i < j < n. 

3. For 1 < i < n, obtain m^ as a direct output of the algorithm and obtain 
s^ from eq. (7). For 1 < i < j < n, obtain s^ from eq. (6). 
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For k = 1, •••, N, do the following: 

For i = 1, •••, n, do the following: 

If x^ex^ * continue; otherwise, return. 
Do the following updates: 

N. = N. + 1 


V 1 + 1 

m i ' N. m i N. x ki 


T. . = 


N. - 1 12 

1 T,, +ir L xr_. 
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If i < n, continue; otherwise, return. 

For j = i + 1 , ••*, n, do the following: 

If X |<ex.jj» continue; otherwise, return. 


Do the following updates: 


N. . = N. - + 1 
ij iJ 
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Figure 1 .- Algorithm. 
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